-
-
Notifications
You must be signed in to change notification settings - Fork 169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH : improvement csv processing #517
Conversation
… to clear the CSV file and save a CSV without header
…he dataset by droping NaN values. I added my functions in tools.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work! I liked the way you separated the file handler function into different functions within the tools.py module
This is my first review, I will later check documentation and tests, but you can already work on the suggestions.
Btw I think this function can be used in other parts of the code as well.
cleaned_data.csv
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you commit this file by mistake?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I forgot to delete it, sorry!
Co-authored-by: Gui-FernandesBR <[email protected]>
Co-authored-by: Gui-FernandesBR <[email protected]>
…ocketPy-Team/RocketPy into enh/improvement-csv-processing
Could you elaborate a bit more on why is this implementation an advantage over the current one, forgive me if I am mistaken, but my interpretation is that:
This seems to the same thing to me, but in the first case we can delegate to We had recently a PR (#485 and #495) to handle CSV headers and there is even a test for it: |
Hello @phmbressan, this code is not only able to skip the first row but also to drop NaN values which it is not the case in the current code. Moreover, the current code is not very efficient dealing with these cases (from my perspective). Try with the following CSV file that @Gui-FernandesBR gave me. The current code throws an error whereas mine deals perfectly with it. Let me know if you want a complete report on this code! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks to your clarifications @kalounis , now I understand the reasoning behind this PR. In fact, processing NaN
values can be quite useful.
As I commented, we just need to make sure to inform if there were any problems on reading the source. After that, we are likely good to go.
# Save the processed data to a new CSV file | ||
with open(output_path, "w", encoding="utf-8") as output_file: | ||
writer = csv.writer(output_file, delimiter=",") | ||
writer.writerows(data_no_headers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I don't think writing a new file with the processed data is really needed here.
More important, I think, is that we must make it clear to the user that not everything of their file was used (because there were NaN
). So, could you trigger a warning
that informs if any lines were skipped on source processing.
The implementation of this warning is up to you, but I believe that a simple boolean
that is set to True
in the for
loop above if there were any skipped lines is enough. Then, if this boolean
is True
raise the warning
.
There are other places in rocketpy
that we raise warning
s if you want to base the implementation on that. Of course, should you have any doubts don't hesitate in commenting.
tests not passing kinda worries me a bit but I couldn't find enough time to investigate it. |
If the problem we are trying to solve is to read .csv files that have NaN values, I think we should consider Changning the This function is a bit more complex and can Relying in numpy may be more beneficial than using the full pythonic approach, both in terms of maintenance and speed. But this is something that can be discussed. |
Closed as not finished. |
Pull request type
Checklist
black rocketpy/ tests/
) has passed locallypytest --runslow
) have passed locallyCHANGELOG.md
has been updated (if relevant)Current behavior
The current code isn't able to deal with CSV files that contain a header.
New behavior
Now, when we create an instance of the Function Class, we can give for argument a CSV file with a header : the code will process this CSV file (drop NaN values) and will be able to deal with the header.
Breaking change
Additional information
Enter text here...